2,138 research outputs found
Ensemble estimation of multivariate f-divergence
f-divergence estimation is an important problem in the fields of information
theory, machine learning, and statistics. While several divergence estimators
exist, relatively few of their convergence rates are known. We derive the MSE
convergence rate for a density plug-in estimator of f-divergence. Then by
applying the theory of optimally weighted ensemble estimation, we derive a
divergence estimator with a convergence rate of O(1/T) that is simple to
implement and performs well in high dimensions. We validate our theoretical
results with experiments.Comment: 14 pages, 6 figures, a condensed version of this paper was accepted
to ISIT 2014, Version 2: Moved the proofs of the theorems from the main body
to appendices at the en
Nonparametric Estimation of Distributional Functionals and Applications.
Distributional functionals are integrals of functionals of probability densities and include functionals such as information divergence, mutual information, and entropy. Distributional functionals have many applications in the fields of information theory, statistics, signal processing, and machine learning. Many existing nonparametric distributional functional estimators have either unknown convergence rates or are difficult to implement. In this thesis, we consider the problem of nonparametrically estimating functionals of distributions when only a finite population of independent and identically distributed samples are available from each of the unknown, smooth, d-dimensional distributions. We derive mean squared error (MSE) convergence rates for leave-one-out kernel density plug-in estimators and k-nearest neighbor estimators of these functionals. We then extend the theory of optimally weighted ensemble estimation to obtain estimators that achieve the parametric MSE convergence rate when the densities are sufficiently smooth. These estimators are simple to implement and do not require knowledge of the densities’ support set, in contrast with many competing estimators. The asymptotic distribution of these estimators is also derived.
The utility of these estimators is demonstrated through their application to sunspot image data and neural data measured from epilepsy patients. Sunspot images are clustered by estimating the divergence between the underlying probability distributions of image pixel patches. The problem of overfitting is also addressed in both applications by performing dimensionality reduction via intrinsic dimension estimation and by benchmarking classification via Bayes error estimationPhDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/133394/1/krmoon_1.pd
Meta learning of bounds on the Bayes classifier error
Meta learning uses information from base learners (e.g. classifiers or
estimators) as well as information about the learning problem to improve upon
the performance of a single base learner. For example, the Bayes error rate of
a given feature space, if known, can be used to aid in choosing a classifier,
as well as in feature selection and model selection for the base classifiers
and the meta classifier. Recent work in the field of f-divergence functional
estimation has led to the development of simple and rapidly converging
estimators that can be used to estimate various bounds on the Bayes error. We
estimate multiple bounds on the Bayes error using an estimator that applies
meta learning to slowly converging plug-in estimators to obtain the parametric
convergence rate. We compare the estimated bounds empirically on simulated data
and then estimate the tighter bounds on features extracted from an image patch
analysis of sunspot continuum and magnetogram images.Comment: 6 pages, 3 figures, to appear in proceedings of 2015 IEEE Signal
Processing and SP Education Worksho
Direct Estimation of Information Divergence Using Nearest Neighbor Ratios
We propose a direct estimation method for R\'{e}nyi and f-divergence measures
based on a new graph theoretical interpretation. Suppose that we are given two
sample sets and , respectively with and samples, where
is a constant value. Considering the -nearest neighbor (-NN)
graph of in the joint data set , we show that the average powered
ratio of the number of points to the number of points among all -NN
points is proportional to R\'{e}nyi divergence of and densities. A
similar method can also be used to estimate f-divergence measures. We derive
bias and variance rates, and show that for the class of -H\"{o}lder
smooth functions, the estimator achieves the MSE rate of
. Furthermore, by using a weighted ensemble
estimation technique, for density functions with continuous and bounded
derivatives of up to the order , and some extra conditions at the support
set boundary, we derive an ensemble estimator that achieves the parametric MSE
rate of . Our estimators are more computationally tractable than other
competing estimators, which makes them appealing in many practical
applications.Comment: 2017 IEEE International Symposium on Information Theory (ISIT
Information Theoretic Structure Learning with Confidence
Information theoretic measures (e.g. the Kullback Liebler divergence and
Shannon mutual information) have been used for exploring possibly nonlinear
multivariate dependencies in high dimension. If these dependencies are assumed
to follow a Markov factor graph model, this exploration process is called
structure discovery. For discrete-valued samples, estimates of the information
divergence over the parametric class of multinomial models lead to structure
discovery methods whose mean squared error achieves parametric convergence
rates as the sample size grows. However, a naive application of this method to
continuous nonparametric multivariate models converges much more slowly. In
this paper we introduce a new method for nonparametric structure discovery that
uses weighted ensemble divergence estimators that achieve parametric
convergence rates and obey an asymptotic central limit theorem that facilitates
hypothesis testing and other types of statistical validation.Comment: 10 pages, 3 figure
The intrinsic value of HFO features as a biomarker of epileptic activity
High frequency oscillations (HFOs) are a promising biomarker of epileptic
brain tissue and activity. HFOs additionally serve as a prototypical example of
challenges in the analysis of discrete events in high-temporal resolution,
intracranial EEG data. Two primary challenges are 1) dimensionality reduction,
and 2) assessing feasibility of classification. Dimensionality reduction
assumes that the data lie on a manifold with dimension less than that of the
feature space. However, previous HFO analyses have assumed a linear manifold,
global across time, space (i.e. recording electrode/channel), and individual
patients. Instead, we assess both a) whether linear methods are appropriate and
b) the consistency of the manifold across time, space, and patients. We also
estimate bounds on the Bayes classification error to quantify the distinction
between two classes of HFOs (those occurring during seizures and those
occurring due to other processes). This analysis provides the foundation for
future clinical use of HFO features and buides the analysis for other discrete
events, such as individual action potentials or multi-unit activity.Comment: 5 pages, 5 figure
Investigations of Temperature and Backscatter Correlation in the Dry Snow Zone of the Greenland Ice Sheet
Due to system degradation, satellite-borne scatterometers require post-launch calibrations to maintain accuracy. The dry snow zone of the Greenland ice sheet has been used for calibration due to its relatively constant backscatter properties. However, we recently discovered that some of the variation in the dry snow zone backscatter is seasonal. This paper uses correlation analysis to investigate the relationship between temperature and backscatter in the dry snow zone. The correlation coefficient is found to be significant, especially after spatially averaging the backscatter. However, an analysis and simulation demonstrate that spatial averaging can artificially increase the correlation coefficient
Image patch analysis and clustering of sunspots: a dimensionality reduction approach
Sunspots, as seen in white light or continuum images, are associated with
regions of high magnetic activity on the Sun, visible on magnetogram images.
Their complexity is correlated with explosive solar activity and so classifying
these active regions is useful for predicting future solar activity. Current
classification of sunspot groups is visually based and suffers from bias.
Supervised learning methods can reduce human bias but fail to optimally
capitalize on the information present in sunspot images. This paper uses two
image modalities (continuum and magnetogram) to characterize the spatial and
modal interactions of sunspot and magnetic active region images and presents a
new approach to cluster the images. Specifically, in the framework of image
patch analysis, we estimate the number of intrinsic parameters required to
describe the spatial and modal dependencies, the correlation between the two
modalities and the corresponding spatial patterns, and examine the phenomena at
different scales within the images. To do this, we use linear and nonlinear
intrinsic dimension estimators, canonical correlation analysis, and
multiresolution analysis of intrinsic dimension.Comment: 5 pages, 7 figures, accepted to ICIP 201
- …